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Abstract 

In this paper, we investigate a Bayesian sparse reconstruction algorithm called compressive sensing 
via Bayesian support detection (CS-BSD). This algorithm is quite robust against measurement noise 
and achieves the performance of an minimum mean square error (MMSE) estimator that has support 
knowledge beyond a certain SNR thredhold. The key idea behind CS-BSD is that reconstruction takes 
a detection-directed estimation structure consisting of two parts: support detection and signal value 
estimation. Belief propagation (BP) and a Bayesian hypothesis test perform support detection and an 
MMSE estimator finds the signal values belonging to the support set. CS-BSD converges faster than 
other BP-based algorithms and it can be converted to an parallel architecture to become much faster. 
Numerical results are provided to verify the superiority of CS-BSD, compared to recent algorithms. 

Index Terms 

Compressive sensing, sparse signal reconstruction, support detection, belief propagation, detection- 
directed estimation 



I. Introduction 

Compressive sensing (CS) in the presence of noise has been intensively investigated in many recent 
papers because any real-world device is subject to at least a small amount of noise. We refer to such 
problems as noisy compressive sensing (NCS). Let x = [x±,...,xn] denote a random vector whose 
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elements are sparsely non-zeros, called sparse signal. Then, the NCS decoder observes a measurement 
vector z = [z\, zm] E M a/ , given as 

z = *x + n, (1) 

where xo E 1^ is a deterministic sparse signal; 3? E M MxAr is a sensing matrix whose columns represent 
a possibly overcomplete basis, i.e., rank(<&) < M, where M < iV;and n E M M is an additive noise vector 
generated by a certain distribution. 

The NCS reconstruction problem has been discussed in terms of conventional Zi-norm approaches 
Hl-G). In CQ-0, the authors assume a bounded noise and in 0,(51, an i.i.d. zero-mean Gaussian noise 
is assumed, i.e., n ~ A/"(0, cr^Ij^). In [5], Candes and Tao proposed an Zi-norm based reconstruction 
algorithm for the Gaussian setup, called the Dantzig selector (Ll-DS): 

x = argmin Hxllj s.t. E ||<fr*(<l>x — z)^ < e, (2) 

where e is the tolerance user defined paramter and * denotes matrice tranposition. The reconstruction 
performance of Ll-DS is proprtional to logarithmic factor, i.e., E ||x — X0II2 < C ■ cr^K (log AT) with a 
constant C (see Th.l in Q). 

Alternatively, Bayesian approaches to NCS have received attention ll7l- |[T5l . This type of approach 
offers powerful mitigation of noise effects by using many existing statistical signal processing techniques 
and several statistical signal-noise models. In these approaches, the reconstruction problem is described 
as the maximum a posteriori (MAP) estimation problem as follows: 

x = argmax / x (x|z) s.t. E ||*&x — z|| 2 < e, (3) 

X 

where Gaussian noise is assumed and /(•) is a probability density function. 

The most well-known Bayesian approach is the sparse Bayesian learning (SBL) algorithm iTTTl - iTlOll . 
The SBL algorithm iteratively determines the posterior density of the signal on basis of a three-layer 
hierarchical prior model, so the prior density is a function of certain parameters. The algorithm estimates 
the parameters of the prior using expectation maximization (EM) and applies these parameters to finding 
the posterior . The SBL approach to sparse reconstruction was originally proposed in |7]|,||S). Recently, 
Ji et al. |0 and Babacan et al. JTOll successfully applied the SBL approach to the NCS reconstruction 
problem with different prior model. 

Another class of Bayesian approaches is sparse reconstruction using sparse matrices iTTSTl - lfT3Tl . The 
work is inspired by the success of low -density parity-check (LDPC) codes in channel coding field |fT~8Tl - 
ll20l . The use of the sparse matrix enables simple and fast signal acquisition that is feasible in real- 
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world applications. In addition, these approaches can be made more attractive if they are applied in in 
conjunction with belief propagation (BP). BP replaces the reconstruction process by iterative message- 
passing processes. This replacement reduces the reconstruction complexity to the 0(N log N) order. 

Baron et al. for the first time proposed the use of sparse matrices to the NCS setup and developed a 
BP-based algorithm, called CS-BP fl2ll . |fl3l . CS-BP iteratively updates the signal posterior from the two- 
state Gaussian mixture prior via the message-passing algorithm, where the messages are the probability 
densities of the signal elements. In |[T4ll . Tan et al. proposed another BP-based algorithm called, BP- 
SBL. They applied BP to the SBL-framework in [9] to reduce the complexity of the EM algorithm. Most 
recently, Akcakaya et al. devised SuPrEM using an idea similar to BP-SBL, but in a different framework 
|fT31 which is based on Gaussian scale mixture |[T6l with a specific type of prior called the Jeffreys' 
prior [17]]. In addition, the authors restrict the story of SuPrEM to a class of sensing matrices, called 
low-density frames, in which the matrices have fixed column and row weights. 

In this paper, we propose a sparse reconstruction algorithm based on the Bayesian approach and the 
use of sparse matrices. We call our algorithm as Compressive sensing via bayesian support detection 
(CS-BSD). CS-BSD has the following properties: 

1) Robustness against the measurement noise effects. 

2) Ability to perform as the minimum mean square error (MMSE) estimator that has knowledge of 
the support set. 

3) Fast convergence. 

CS-BSD has a detection-directed (DD) estimation structure which consists of signal support detection 
and signal value estimation, as shown in Fig{T] We consider the common procedure of first using the 
measurements at hand to detect the signal's support set. This detected support set is then used in the 
model of the sparse signal, and the value estimator is built as if the detected support set is in fact the 
correct set. The support detection component consists of a combination of the Bayesian hypothesis test 
(BHT) and BP, and signal value estimation using the detected support set is achieved via an MMSE 
estimator. CS-BSD iterates the detection and estimation processes until the constraint in ([3]) is met. 

The DD estimation methodology was investigated in [21 ] for estimation of noisy signals and have been 
widely applied to wireless communication systems [22], [23]. For CS, the methodology was first reported 
in Il24ll . ll25l : we tailor the methodology to the NCS problem by refining that work. The complexity of 
CS-BSD is 0(iVlogiV + KM) whereas that of the other BP-based algorithm is O(iVlogiV) because 
CS-BSD includes the cost of MMSE in addition to that of BP. However, CS-BSD converges faster than 
the other BP-based algorithms; thus, its computational cost is lower in practice. In addition, CS-BSD can 
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be much faster by converting to an parallel architecture. 

The rest of the paper is organized as follows. Section II introduces the sparse sensing matrix, the prior 
model for our system model. The details of CS-BSD are given in Section III. A few practical issues are 
discussed in Section IV. We compare the numerical results of CS-BSD to the other recent CS algorithms 
in Section V. Section VI concludes the paper. 



A. Sparse Sensing Matrix <fr 

For signal sensing, we employ sparse-Bernoulli matrices $ G {0,1, —l} MxN , which have been 
successfully used in CS recently |[L2l - lfl4"l . In the matrix, sparsely nonzero elements are equiprobably 
equal to 1 or —1. We set the sparsity of $ using the fixed column weight L. Because the column weight 
rather than the row weight is fixed, all elements of xo have an even chance of being sensed. In addition, 
the fixed column weight unifies the energy of the basis of the measurement space spanned by the column 
vectors of 

With the sparse-Bernoulli matrix, the linear system z = <&xo + n can be represented over a bipartite 
graph. Let V := {1,2,..., N} denote a set of indices corresponding to the elements of the signal vector, 
x o = [ x o,i, x o,2, xq,n]- Similarly, C := {1,2, M} denotes a set of indices corresponding to the 
elements of the measurement vector, z = [z%, Z2, zm\ In addition, we define a set of edges connecting 
V and C as 6 := {(i, j) G V x C \ \4>%j\ = 1} where ^ is the (i, j)-th element of Then, A bipartite 
graph Q := (V,C,£) fully describes the neighboring relation in the linear system. Furthermore, we define 
the neighbor set of V and C as N v (i) := {j G C\(i,j) G £} for alii G V and Nc(j) ■= {i G V\(i,j) G £} 
for all j G C, respectively. Note that |jVy(i)| = L for alH G V under our assumption regarding Figj2] 
depicts a simple example of the graphical representation corresponding to N = 6, M = 4, L = 2. 

B. Prior Model 

We limit our discussion to the random vector x whose elements are i.i.d. random variables. This as- 
sumption is commonly used in many papers iTTTl - lTT3Tl . We characterize the signal sparsity in a probabilistic 
manner, called sparsity rate. The sparsity rate q is defined as q := Pr{xj / 0} for all i G V. Namely, 
each signal element independently belongs to the signal support set with the rate q. The supportiveness 
of each signal element is represented by a state variable Sj, defined as 



II. System Model 




for all i G V. 



(4) 
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Hence, we model the prior density of xi using a spike -and- slab model originating in a two-state mixture 
density as follows: 

fx(x) := qf x (x\s = 1) + (1 - q)f x (x\s = 0) 

= qM(x;0,al) + (l-q)S(x), (5) 

where 5(x) indicates a Dirac distribution having nonzero value between x G [0— ,0+] and f 5{x)dx = 1. 
In the prior density, we use Gaussian density M(x;0,a^.) for f x (x\s = 1) although it includes the 
probability mass at Xi = 0. The reason is the probability mass at x-i = is very small and Gaussian 
densities are mathematically tractable. In addition, we drop the index i from the prior density under the 
assumption of i.i.d. elements. The spike-and-slab prior has been widely employed in Bayesian inference 
problems [26]-[28] and was recently applied to CS [11] as well. 



III. Proposed Algorithm 

In this section, we discuss the details of the proposed algorithm based on the DD estimation structure. 
The proposed algorithm, CS-BSD, is an iterative algorithm that repeats the support detection and signal 
value estimation processes until E ||<&x — z|| 2 < e is met. 

A. Detection of Support Set 

The decoder detects the signal support in each element unit. Namely, the supportive state of each signal 
element is detected independently and converted to the support set information for the signal. First, the 
following simple hypothesis test can be considered for the state detection of Xi. 

Pr{xi = 0|z} ^ Pi{ Xl / 0|z} for all i G V, (6) 

H 1 

where Hq and H\ are two possible hypotheses. If we marginalize over s«, the left hand side of (|6]) 
becomes 

Pr{x, = 0|z} = £ Pr{ Xi = 0|z, s;} Pr{ Si |z} 

S;e{0,l} 

= Pr{xi = 0|z, s t = 1} Pr{si = l|z} + Pr{x; = 0|z, s { = 0} Pr{ Si = 0|z} (7) 

V v ' 

=1 

= Pr{x, = 0|z, Sl = 1} Pr{si = l|z} + Pr{ Si = 0|z}, 



where 



Pr{xi = 0|z, Si = 0} = Pr{xi = 0|sj = 0} 

= /o°- fx(x\s = 0)dx (8) 
= C5(x)dx = l. 
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The right hand side of (|6]) is 

Pr{x,/0|z} = J2 Pr{xi /0|z, Si }Pr{ Si |z} 

Si£{0,l} 



where 



Pr{x; ^ 0|z, Si = 1} Pr{s; = l|z} + Pr{x; / 0|z, Si = 0} Pr{ Si = 0|z} (9) 

V V ' 

=0 

Pr{xi/0|z, Si = l}Pr{si = l|z}, 

Pr{x 4 + 0|z, Si = 0} = Pr{x; / 0|s, = 0} (10) 

5(x)dx = 0. (11) 



From ^ and (|9]), the hypothesis test in ([6]) is refined as 

Pr{«-?B ^° Pr {^ ^ °l z ' a f = 1} - Pr{x, = 0|z, Si = 1} 
1,-11 Hi (12) 

= 1 - 2 x Pr{x; = 0|z,Sj = 1}. 

Here 



r0+ 

Pr{xi = 0|z,Sj = 1} = / f Xz (x\z,Si = l 

Jo- 



dx, (13) 



where the posterior density, f x . (x\z, Si = 1), is Gaussian because the signal and noise are assumed to be 
Gaussian (see p. 326 in [29]). The term of Pr{xj = 0|z, s« = 1} in right hand side of ( fT2] ) is caused by the 
use of Gaussian density Af(x; 0, a x ) for the prior of nonzero xi. Because the variance of f Xi (x|z, Sj = 1) 
is a function of the noise variance, the probability Pr{xj = 0|z, Sj = 1} is very small, and it approaches 
zero as the SNR increases. Therefore, we suggest setting the threshold of the hypothesis test in ( [12] ) to 
1. This implies that the hypothesis test can detect the supportive state of the signal elements with a high 
success probability if SNR is sufficiently high. 

We now describe how to obtain the probability ratio, p~t~^t - By factorizing over x.j, the ratio 
becomes 

Prjsj = 0|z} _ /Prjsj = 0|z, Xj}f Xi (x\z)dx H, ^ 
Pr{sj = l|z} fPr{si = l\z, Xi }f Xi ( X \z)dxH 1 
where f Xt (x\z) denotes the posterior density of x% given z. The signal elements are not i.i.d. anymore 

given z. In ( fT?] ), Pr{sj|z,Xj} = Pr{si|xj} holds true since the measurements z does not provide any 

additional information on the state given Xj. Using the Bayesian rule and the prior information, we finally 

obtain the hypothesis test as the following form: 

pr{s t= oiz } j ^^ t^** * 

Pr{s, = l|z} f f-W^W f Xt (z\*)dx * ' 
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Since we know the prior of the state Pr{s} from the sparsity rate, i.e., Pr{s = 1} = q, we can move the 
prior term to the right side, and then treat it as a threshold for the hypothesis test 7 := g^fe^t = (\l q \ - 
Therefore, the state of each elements can be sensed from the corresponding posterior and prior densities. 

Definition 1 (BHT for state detection): Let denote the detected state of Xi, f Xi (x\z) indicates the 
posterior density of Xi\, and f x (x\s) denotes the conditional prior density of a signal element given the 
state. Then, state detection for all i G V is performed by choosing the hypothesis that result from 

Pr{ Si = 0|z} _ / U ^° ] UMz)dx Ho 



Pr{ Sl = l|z} fl^lf Xi ( x \ z) dxH 



^7, (16) 



/*0) 
, H : Si = 

where { , 7 := q/(l - q). (17) 

H 1 : Si = 1 

B. Belief Propagation for Posterior Update 

The posterior density used for the BHT is obtained and updated at every iteration via BP. Our BP 
process is similar to that in |[T2ll , |fT3l and was independently devised from |[T4ll , |[T5l . Distinctively, our BP 
process uses the information on the noise distributions f n (n) = AA(n; 0, cr^) under the i.i.d. zero-mean 
Gaussian noise assumption. 

Using Bayesian rule, we can represent the posterior density of x-i in the form of Posterior = Prior x 
S^T> g^en as 

f x Xx\z) = f x (x)x^^. (18) 
M z ) 

If the sensing matrix $ is sufficiently sparse such that the corresponding bipartite graph is tree-like, we 
postulate that the elements of z associated with xi are independent of each other given Xi fl9ll . Under 
the tree-like assumption, we can decompose the likelihood density / z (z|xj) to the product of densities: 

f Xi (x\z) oc f x (x) x JJ fzj(z\xi). (19) 
jeN(i) 

We call each decomposition of the likelihood, f z .(z\xi) the measurement density. Theorem 1 below 
demonstrates that the measurement density can be composed of the densities of the associated signal 
elements. 

Theorem 1 (Measurement density in BP): The measurement density f z .(z\xi) is expressed as the lin- 
ear convolution of all the associated distributions of the signal elements and the corresponding noise 
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distribution f n (n) as follows: 



f Zj (z\ Xi ) = 5(z - Zj ) ® / n . (n) 8) [ (ar) I , (20) 

K keN c (j)\{i} 

for all G £, where (8) and (g) are the operator for linear convolution and the linear convolution of 
a sequence of functions, respectively 
Proof: See Appendix A. 



Therefore, the essence of the BP-process is to update the signal and measurement densities by ex- 
changing probability density messages, associated with the neighboring relation in the bipartite graph. 
Let aj_>j denote the message from the i-th signal element to the j-th measurement element, called the 
signal message; hj^i is the message from the j-th measurement element to the i-th signal element, called 
the measurement message. The signal message is an approximation of the density of the signal element, 
i.e., ~ f Xi (x\z) and it is obtained from ([19]) simply by replacing the measurement density with the 
measurement message of the previous iteration. Note that in BP-process the message coming from the 
j-th measurement is excluded in the calculation of Thus, the signal message at the l-th iteration is 
expressed as 



f x (x) x J] b 



l-l 



(21) 



k&N v (i)\{j} 

for all (i, j) 6 £, where rj[-] is the normalization function to make J Sii^jdx = 1. Similarly, the 
measurement message approximates the measurement density, i.e., f Z] {z\xi), and it is obtained 

from the expression of p0] ) by replacing the associated densities of signal elements f Xk (x) with the signal 
messages for the purpose of iteration , that is, 



b}.^ :=5(z- Zj ) f n . (n)« [ Q<) a J fc _, , j . (22) 

K k&N c (j)\{i} 

The convolution operations in ([22]) can be efficiently computed by using the Fast fourier transform 
(FFT). Therefore, we express for the measurement message calculation as 



F6(z- Zj )xFf n] (n)x J] Fa^- 



(23) 



keN c (j)\{i} 

where F € C NdXNd denotes a Fourier matrix of size N^. In fact, the use of the FFT brings a small 
calculation gap between this result and that of ( [20] ) since the FFT-based calculation performs a circular 
convolution that produces output having a heavy tail, as shown in Fig[3] The heaviness increases as the 
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corresponding row weights in 3? increase. However, the difference is can be ignored, especially when 
the densities are bell-shaped distributions. 

Finally, the update of the posterior density of X{ at the Z-th iteration is provided as given in Definition 

2. 

Definition 2 (Posterior update in BP): Let b^-_ >i denote a measurement message at the Z-th iteration 
for all G £. Then, the posterior density of Xi at the Z-th iteration is calculated by 



f x i(x\z) = rj 



fx{x) x j ] b 



l 

■J'-M 

jeN v (i) 



(24) 



where r][-] is the normalization function that makes J f x i(x\z)dx = 1. 



C. Detection-Directed Estimation of Signal Values 

We now describe signal value estimation based on the DD estimation structure. The DD estimator is 
basically an estimator that determines how to act on the input data directed by the information from the 
detector. In CS-BSD, the detector provides the support information s\ and the value estimator then finds 
the signal values as if the detected support set is the correct set at each iteration. That is, 

x 1 = argmax/ x (x|z, s = s*) s.t. E ||(«&x — z)|| 2 < e, (25) 

where the estimator decides that x\ = for all i € V : s\ = 0. From the argument in ( 12 1, the DD estimate 



converges to the true signal xo since the detected support set becomes the correct set as SNR and the 
number of iterations I increases. This DD methodology makes no general claim regarding optimality of 
the solution; however, it is common and often successful. Let x l £ l s M I o denote a random vector 



consisting of the elements with P { = 1. Then, the problem in ( [25] ) is reduced to 

X-Lpp = ar S max /x' (x|z, s = s*) 

(26) 

= argrnax/ z (z|x^ pp , s = s')/ x ^ pp (x|s = s*). 
Since x l supp and the noise elements are assumed to be zero-mean i.i.d. Gaussian random variables with 



variance a x and er„ respectively, the MAP estimation in p6| ) is recast as 



. IN jr-l l ll^i l|2 

^■supp ar S rnm ||z supp x -supp\\2 ' of Il x s?wll2 ' (27) 



where & supp denotes a submatrix of $ corresponding to i G V : Sj = 1. In addition, the MAP and MMSE 
estimates are identical, assuming the signal and noise are Gaussian (see p. 358 in ||29l ). Therefore, the 
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estimate x l sup p can be obtained by the MMSE estimator 

x ; = I — T -I- — $> l * (f> 1 l —fi> 1 * z 

■^supp I 2 ^-2 supp ^ supp J o swpp V-^ u / 

To combine the support information s* and the estimated values x l supp , we define an index set U l := 
{1, ||s*|| } corresponding to the elements x z supp = [x supPt i, x aupPs \\§i\\ ] and a bijective mapping 
function h : {i G V|s^ = 1} — > U l . Then, the reconstruction at each iteration is readily obtained from 

x suppMi)' Si = (29) 
0, o.w. 

for all i G V. CS-BSD is summarized in Algorithm [T] 

IV. Practical issues 

A. Complexity 

We implement the BP process in CS-BSD based on the sampled-message approach in lfl3l . The density 
messages , by-^ are vectors of size Nd where Nd is chosen to be power of two for efficient use of 
FFT Next, we analyze the complexity of CS-BSD by considering each part seperately. 

1) Support detection: Let us consider the complexity of BP first. Since the matrix has the fixed 
column weight L and the size for a density vector is Nd, the decoder requires O(LNd) flops per iteration 
to calculate the signal message in ( pi) , and 0( N ^ d log Nd) flops per iteration to calculate the 
measurement message hj^i in ( [20] ), since the row weight is NL/M on average and the cost of the 
FFT-based convolution is 0(Nd log Nd). Hence, the per-iteration cost for all probability messages is 



0(NLN d + M^j^ \ogN d ) flops. For the BHT in ([T6]>, the decoder requires 0(N d ) flops to calculate a 



likelihood ratio. The cost for the hypothesis test is much smaller than that of BP; therefore, it is ignored. 

2 ) Signal value estimation: Let us fix the signal sparsity as the expected value of the cardinality of the 
support set, i.e., K := i£[||x|| ] = Nq, for purpose of comparison. Then, the complexity of the MMSE 
estimation in ( [28] ) depends strongly upon K such that conventionally it requires O(KM) flops if QR 
decomposition is used lf30| . Thus, the total complexity of CS-BSD is O (N iter NLN d log N d + N iteT KM) 
flops where Nn er denotes the number of iterations. If L and Nd are fixed, the complexity of CS- 
BSD can be simplified to 0(Nit er N + Nit er KM) flops. The BP process is known to converge within 
N iter = 0(logN) (23 such that its complexity is <3(iVlogiV + KM log N). If we fix the number of 
iteration Ni ter empirically, we can remove the MMSE operation from the iterations. In that case, the 
complexity is reduced to 0(N log N + KM). 
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B. Parallelization of Belief Propagation 

The BP process for finding the posterior finding can be implemented using a parallel architecture. 
Indeed, many parallelized BP algorithms, with applications to LDPC codes, have demonstrated superior 
performance in ll3TTl - ll33l . The graph representation of the sparse sensing matrix shows that the depen- 
dencies of the message calculations for any signal elements (or measurement elements) depend only 
upon the corresponding measurement elements (or signal elements). This allows all messages in BP to 
be computed in a parallel manner. Therefore, implementing BP on a parallel architecture for BP yields 
low power consumption, high-speed decoding, and simple logic ll31~ll . 

V. Numerical results 

We demonstrate the advantages of CS-BSD using simulation results in several different settings. To 
show its average performance, we take 200 Monte Carlo trials for each point in the simulation. In each 
trial, we generate the deterministic sparse signal xo with N = 1024 and a x = 10 whose values are 
represented with finite precision. The finite precision is provided by 6-bit quantization such that each 
signal value has 64 levels. This assumption of finite precision for the signal values is reasonable in terms 
of digital signal processing and implementation. In addition, we restrict the magnitude level of the signal 
elements to \xi\ < 3a x for the same reason. We define the SNR as 

SNR : = 10 log 10 ]^ 112 dB (30) 

and M/N as the undersampling ratio for signal acquisition. 

A. SER Performance of Support Detector 

To determine the performance of the support detector in CS-BSD, we defined the state error rate 
(SER) as: 

'#{< € V\si / s ,j}" 



SER := avg 



(31) 



N 

where so,j is the state variable corresponding to the true signal value Xq^. We simulate the SER perfor- 
mance as a function of the SNR for a variety of undersampling ratio M/N. In this simulation, we set 
q = 0.05, Nd = 64, and L = 4. In addition, we compare the SER performance to a theoretical limit on the 
support recovery given by Fletcher et al. |[35l . They found a necessary condition for maximum-likelihood 
(ML) estimation to asymptotically recover the support set if the sensing matrix has i.i.d. Gaussian entries. 
The ML estimation is described as 

max||Pj-z||2 s.t. \J\ = ||x|| , (32) 
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where the signal sparsity ||x|| is assumed to be known, J C V is a subset of the index set of the 
signal, and P jz denotes the orthogonal projection of z onto the subspace spanned by columns of $ 
corresponding to J. Namely, the ML estimate is a subset of V such that the subspace spanned by the 
corresponding columns of $ contain the maximum energy of z. We rewrite the necessary condition in 
terms of SNR such that 

2 x llxIL log(iV — llxIL) 
SNR > SNR /imii := — " ,, °„ &K . — (33) 
(M — ||x|| + 1) x MAR v ' 

where minimum-to- average ratio (MAR) is defined as 

min \xj\ 2 

MAR := f g ... ■ (34) 

ll x ll 2 /ll x llo 

In this comparison, we used 200 Monte Carlo trials to find the average SNR^ m j t , i.e., SNR^ m j t := 
avg[SNRti m it\. In Figj4j the SER curves show a waterfall behavior; the curves decline rapidly to less 
than 10~ 5 beyond a certain threshold SNR. This behavior supports the argument in §V2\ that the BHT 
achieves successful support detection in the high SNR regime. We consider the SER=10~ 5 bound as 
an almost error-free bound since it is much less than the rate of one state error 1/N « 1CT 3 when 
N = 1024. The threshold SNR for the error-free bound is roughly 34.8 dB for M/N=03, 32.9 dB for 
M/N = 0.4, and 31.1 dB for M/N = 0.5. Remarkably, this threshold SNR approaches toSNR Kmit 
as M/N increases. For example, the gap between the limit and the simulation result is 0.58 dB for 
M/N = 0.3; however, the gap is only 0.2 dB for M/N = 0.5. For M/N = 0.2, since the sensing 
matrix $ is not sufficiently sparse, the tree-like assumption regarding $ is rarely satisfied. Such a fact 
occasionally causes the BP-process to diverge, leading to severe errors in support detection. 

B. MSE Performance Comparison 

We consider the reconstruction performance in terms of normalized means square error (MSE), which 



is defined as 



MSE := avg 



|x - x || 2 

II II- 
ll x o|| 2 



(35) 



We compare our algorithm to several recent CS reconstruction algorithms: 1) CS-BP lfT2l , |[T3l , 2) Ll-DS 
via linear programming |U, 3) Bayesian CS (BCS) (3, 4) CoSaMP OH, and 5) SuPrEM (reweighted 
version) [15]. For BCS and SuPrEM, we obtained the source code from each author's webpage; for 



CoSaMP we used Stephen Becker's code (available at http://www.ugcs.caltech.edu/~srbecker/algorithms 



shtml). Ll-DS is provided by the Ll-MAGIC package (available at http ://users.ece.g atech.edu/~justin/ 
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llmagic/).We implemented CS-BP algorithm by using the sampled-message approach and upgrading the 
original algorithm to use the noise information. For CS-BP, we used the sparse-Bernoulli sensing matrix 
with L = 4; for SuPrEM, we use a sensing matrix generated from a low-density frame [15] with the same 
parameters (N, M, L). Ll-DS, CoSaMP and BCS were used with a Gaussian sensing matrix having the 
same column energy as the sparse-Bernoulli matrix, for fairness, i.e., \\<t>j t Gaussian\(2 = H^j.Sparaella = 
The sparsity of an input parameter in CoSaMP and SuPrEM was set according to the expectation of the 
cardinality of the support set K := i?[||x|| ] = Nq. Those algorithms are summarized in Table [Ij with 
respect to thier complexity, type of sensing matrix, prior type, and algorithm type. 

1 ) Comparison with respect to SNR: In Fig j5] we show the MSE performance as a function of SNR 
where M/N = 0.5, q = 0.05, and N d = 64. In the high SNR regime, the advantage of CS-BSD becomes 
remarkable. As the SNR increases, the MSE of CS-BSD approaches to that of an MMSE estimator that 
has knowledge of the support set, defined as 



Tr 

MSE* := — 



T 2 I + <7 2 ^SUpp^SUpp 



I II 2 



(36) 



where Tr[-] denote the matrix trace operation. Beyond SNR=31 dB, since the SER of CS-BSD is almost 
error-free, the MSE performance achieves MSE* at M /N > 0.5. Surprisingly, this result is superior to 
that of the l\ norm based approach, which is known as an optimal algorithm in the noiseless case. The 
gap between the two algorithms is caused by the reconstruction error over the non-supporting elements. 
CS-BSD completely removes the error from the non-supporting elements whereas the l\ norm based 
approach leaves a certain amount of the reconstruction error on the non- supporting elements. 

In the low SNR regime, it is noteworthy that CS-BSD works well although the proposed algorithm was 
originally targeted at a reasonable system having high SNR. For example, CS-BSD achieves MSE=10~ 2 
at SNR=14 dB in Fig|5j which provides 3 dB SNR gain from Ll-DS; 2 dB gain from CoSaMP; 1 dB 
gain from CS-BP and SuPrEM. To support this result, we present Figj6] which describes the iterative 
behavior to find the posterior of xi given z at SNR=10dB. If so,i = 0, most of the probability mass in the 
posterior stays at the zero-spike as shown in Fig|6]-(a); if so,i = 1> the probability mass gradually shifts 
toward an estimated value as shown in Figj6]-(b), over the iteration. Since the SNR is low, the probability 
mass spreads considerbly over the neighbored values due to the noise effect; thus, it can lead to difficulty 
in detecting the state of the signal element using the simple MAP criterion. In CS-BSD, the use of the 
BHT nicely compensates for this weakness of the MAP by scanning the probability mass over the entire 
range of values. 
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2) Comparison over number of iterations: In FigJTJ 



we examine the MSE performance of the BP-based 



algorithms, CS-BP and SuPrEM, as a function of a fixed number of iterations where N/M = 0.5, q = 0.1, 
Nd = 64, and SNR = 50 dB. In this simulation, we used the non-reweighted version of SuPrEM since 
the reweighted version requires more than 10 iterations. The figure demonstrates that CS-BSD converges 
faster than CS-BP and SuPrEM. The convergence of CS-BSD is achieved within 2 to 3 iterations with 
CS-BP, whereas SuPrEM require more than 10 iterations. 



The theoretical and empirical research in this paper demonstrated that CS-BSD is a powerful algorithm 
for sparse signal reconstruction in NCS. In CS-BSD, we employed the DD estimation structure, which 
consists of support detection and signal value estimation. In the support detection process, BP provides 
the signal posterior densities, and then BHT detects the support based on the posteriors. In the signal value 
estimation process, an MMSE estimator provides the signal values using the detected support set. These 
detection and estimation process are iterated until the constraint E||<I>x — z||2 < e is met. The evaluated 
SER performance showed that the support detection of CS-BSD is almost error-free beyond a certain 
threshold SNR according to the undersampling ratio M/N. On the basis of the SER result, we argued 
that CS-BSD achieves the performance of an MMSE estimator that has the knowledge of the support 
set beyond the threshold SNR. We supported the argument by evaluating the MSE performance. The 
complexity of CS-BSD is 0(iV log N + KM), which includes the cost of MMSE 0(KM), in addition 
to that of BP, 0(N log N). Although our algorithm incurs an additional cost for MMSE estimation, it 
converges faster than other BP-based algorithms, so the computational cost is lower in practice. 



Proof: We define a random vector x-N c (j) = [ x N c (j),ii ■■■■> x N c {o),w\ consisting of the signal elements 
associated with Zj and the corresponding index set W := {!,-••, W}, where W := \Nc(j)\. With a 



VI. Conclusion 



Appendix A 



Proof of Theorem 1 



bijective mapping function g : Nc(j) — > W, each element of Xjy c y) corresponds to 



Xk = x N c {j), g (k) for all k G N c (j). 



(37) 



By marginalizing over rij to f z .(z\xi), we obtain 




(38) 
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where f n . (n|xj) = f nj (n) since n is independent of x. By further marginalizing over elements of Xjv c m. 
we rewrite the expression in ([38]) as 



fz 3 (z\xi)= j f /^(^x^y^n^/n^n^ (39) 

where we assume xi = XN c (j)i without loss of generality. In addition, f Zj (z\x.^ c fj\,rij) = 5(z — 
Zj) holds true since knowing X-n c (j) * s equivalent to knowing {&*-) row ^y, thus, there is no uncer- 
tainty in zj = (&x) row rj\ + nj. Since the elements of x are assumed be independent, we replace 



fx. NcU) \x NcU) i( x 2, ■■■,xw\%i) in (39 1 with the product of the probability densities. 

fzj ( z j\Xi) 



J J s ( z ~ z j)fn 3 {n) I f[ fx NcUhni (x w )(dx w ) \dn (40) 

n. r„ iw \w=2 J 



The expression in (40) can be represented by a sequence of convolutions of probability densities, as given 



in(20l 
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TABLE I 

Comparison of several recent sparse recover algorithms 



Algorithm 


Complexity for recovery 


Type of $ 


Prior type 


Algorithm type 


CS-BSD 


0{NlogN + KM) 


sparse-Bernoulli 


spike-and-slab 


MMSE, BP, BHT 


CS-BP 


0(N log N) 


sparse-Bernoulli 


two-state Gaussian mixture 


MAP , BP 


SuPrEM 


0(N log N) 


Low-density frame 


Jefferys', Sparsity K 


MAP, BP, EM 


BCS 


0{NK 2 ) 


Gaussian 


Gamma 


MAP, BP, EM, 


CoSaMP 


0(MN log K) 


Gaussian 


Sparsity K 


Greed pursuit 


Ll-DS 


Q(N 3 ) 


Gaussian 




CVX opt. via LP 
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Algorithm 1 CS-BSD 

Inputs: Noisy measurements z, Sensing matrix <E», Priori density f x (x), density of noise element f nj {n). 
Outputs: Reconstructed signal x, Detected support set s. 

^Initialization: 

set I = 0, e 

set b^=° = 1 for all (i,j) G £ 

set 7 = q/(l -q) 

while E||<&x' — z || 2 > e do 

set / = / + 1 

2) Support Detection: 



n b&], and 



k£N v (i)\{j} 




® 4 

*eJVcCi)\{0 




for all (i, j) G £ 



set /^.(^Iz) =V fx(x)x n 
for i = 1 to iV do 




for all i G V 




< 7 then set = 1 



end if 



end for 



3) Signal Value Estimation: 




x 



if Si = 1 



for all i G V 



set x\ 



o.w. 



end while 
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Support Info. 




Iteration 
Control 



Posterior 



Termination 
condition 



Fig. 1. System model of CS-BSD. 
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Fig. 2. Overall flow of support detection in CS-BSD: A case for N = 6, M = 4, L = 2. 
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Fig. 3. Calculation gap between use of linear convolution and FFT-based convolution in measurement message calculation. 




24 26 28 30 32 34 36 38 



SNR <dB> 



Fig. 4. SER for support detection of CS-BSD over SNR for N = 1024, q = 0.05, L = 4, and N d = 64. The double-lines 
indicate SNRu m it and the downarrow-lines denote the SNR threshold of the support detector. 
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SNR <dB> 



Fig. 5. MSE comparison over SNR for N = 1024, q = 0.05, M/N = 0.5, and N d = 64 where MSE* denotes the MSE of 
the MMSE estimator which has the support knowledge. 




Fig. 6. Iterative behavior to find posterior of Xi at SNR=10dB: (a)when So,j = 0, (b)when so,j = 1. The dotted-red line 
indicates the posterior density after 5 iterations. 
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Fig. 7. MSE performance of BP-based algorithms over the number of iterations for N — 1024, M/N = 0.5, q — 0.1, Nd = 64, 
and SNR = 50 dB. 
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